Recently, the success of pre-training in text domain has been fully extended to vision, audio, and cross-modal scenarios. The proposed pre-training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-training models within a uniform framework. In this paper, we present TencentPretrain, a toolkit supporting pre-training models of different modalities. The core feature of TencentPretrain is the modular design. The toolkit uniformly divides pre-training models into 5 components: embedding, encoder, target embedding, decoder, and target. As almost all of common modules are provided in each component, users can choose the desired modules from different components to build a complete pre-training model. The modular design enables users to efficiently reproduce existing pre-training models or build brand-new one. We test the toolkit on text, vision, and audio benchmarks and show that it can match the performance of the original implementations.
translated by 谷歌翻译
Fact verification has attracted a lot of research attention recently, e.g., in journalism, marketing, and policymaking, as misinformation and disinformation online can sway one's opinion and affect one's actions. While fact-checking is a hard task in general, in many cases, false statements can be easily debunked based on analytics over tables with reliable information. Hence, table-based fact verification has recently emerged as an important and growing research area. Yet, progress has been limited due to the lack of datasets that can be used to pre-train language models (LMs) to be aware of common table operations, such as aggregating a column or comparing tuples. To bridge this gap, in this paper we introduce PASTA, a novel state-of-the-art framework for table-based fact verification via pre-training with synthesized sentence-table cloze questions. In particular, we design six types of common sentence-table cloze tasks, including Filter, Aggregation, Superlative, Comparative, Ordinal, and Unique, based on which we synthesize a large corpus consisting of 1.2 million sentence-table pairs from WikiTables. PASTA uses a recent pre-trained LM, DeBERTaV3, and further pretrains it on our corpus. Our experimental results show that PASTA achieves new state-of-the-art performance on two table-based fact verification benchmarks: TabFact and SEM-TAB-FACTS. In particular, on the complex set of TabFact, which contains multiple operations, PASTA largely outperforms the previous state of the art by 4.7 points (85.6% vs. 80.9%), and the gap between PASTA and human performance on the small TabFact test set is narrowed to just 1.5 points (90.6% vs. 92.1%).
translated by 谷歌翻译
在本文中,我们研究了Micro-Video平台中的对象效果建议的新主题,这对于许多实际应用(例如广告插入)来说是一项具有挑战性但重要的任务。为了避免引入由图像框架直接学习视频内容引起的背景偏见的问题,我们建议利用3D人类姿势中隐藏的有意义的肢体语言进行推荐。为此,在这项工作中,引入了一种新型的人类姿势驱动的对象效应建议网络称为poserec。 Poserec利用了3D人姿势检测的优势,并从多框架3D人姿势中学习信息进行视频项目注册,从而导致高质量的对象效应建议性能。此外,为了解决对象效应建议中存在的固有的歧义和稀疏性问题,我们进一步提出了一种新颖的物品感知的隐性原型学习模块,并提供了一种新颖的姿势感知的托管性托管性硬性阴性挖掘模块,以更好地学习姿势 - 项目。更重要的是,为了为新研究主题进行基准方法,我们构建了一个新数据集,用于对象效果建议,名为Pose-Obe。对姿势攻击的广泛实验表明,我们的方法比强基础可以取得更高的性能。
translated by 谷歌翻译
对于许多应用程序,例如同时本地化和映射(SLAM),基于点云的大规模识别是一项重要但具有挑战性的任务。以任务为云检索问题,以前的方法取得了令人愉快的成就。但是,如何处理由旋转问题引起的灾难性崩溃仍然不足。在本文中,为了解决这个问题,我们提出了一个基于点云的新型旋转型大型位置识别网络(RPR-NET)。特别是,为了解决问题,我们建议分三个步骤学习旋转不变的功能。首先,我们设计了三种新型的旋转不变特征(RIF),它们是可以保持旋转不变属性的低级特征。其次,使用这些Rifs,我们设计了一个细心的模块来学习旋转不变的内核。第三,我们将这些内核应用于先前的点云功能,以生成新功能,这是众所周知的SO(3)映射过程。通过这样做,可以学习高级场景特定的旋转不变功能。我们将上述过程称为细心的旋转不变卷积(ARICONV)。为了实现位置识别目标,我们构建了RPR-NET,它将Ariconv作为构建密集网络体系结构的基本单元。然后,可以从RPR-NET中充分提取用于基于检索的位置识别的强大全局描述符。普遍数据​​集的实验结果表明,我们的方法可以在解决旋转问题时显着优于现有的最新位置识别模型的可比结果,并显着优于其他旋转不变的基线模型。
translated by 谷歌翻译
基于点云的大规模地位识别对于许多应用程序,如同时本地化和映射(SLAM)等许多应用是基础的。虽然已经提出了许多模型并通过学习短程局部特征而实现了良好的性能,但往往忽略了远程语境特性。此外,模型大小也已成为其广泛应用的瓶颈。为了克服这些挑战,我们提出了一个超级轻型网络模型,被称为SVT-Net,用于大规模识别。具体地,在高效的3D稀疏卷积(SP-CONV)之上,提出了一种基于原子的稀疏体变压器(ASVT)和基于簇的稀疏体变压器(CSVT),以学习短程局部特征和长期 - 此模型中的上下文功能。由ASVT和CSVT组成,SVT-NET可以在基准数据集中实现最先进的,其精度和速度都具有超光模型尺寸(0.9M)。同时,引入了两种简化的SVT-NET版本,也实现了最先进的,进一步降低了模型尺寸至0.8米和0.4米。
translated by 谷歌翻译
Transformer-based models have gained large popularity and demonstrated promising results in long-term time-series forecasting in recent years. In addition to learning attention in time domain, recent works also explore learning attention in frequency domains (e.g., Fourier domain, wavelet domain), given that seasonal patterns can be better captured in these domains. In this work, we seek to understand the relationships between attention models in different time and frequency domains. Theoretically, we show that attention models in different domains are equivalent under linear conditions (i.e., linear kernel to attention scores). Empirically, we analyze how attention models of different domains show different behaviors through various synthetic experiments with seasonality, trend and noise, with emphasis on the role of softmax operation therein. Both these theoretical and empirical analyses motivate us to propose a new method: TDformer (Trend Decomposition Transformer), that first applies seasonal-trend decomposition, and then additively combines an MLP which predicts the trend component with Fourier attention which predicts the seasonal component to obtain the final prediction. Extensive experiments on benchmark time-series forecasting datasets demonstrate that TDformer achieves state-of-the-art performance against existing attention-based models.
translated by 谷歌翻译
Neural network pruning has been a well-established compression technique to enable deep learning models on resource-constrained devices. The pruned model is usually specialized to meet specific hardware platforms and training tasks (defined as deployment scenarios). However, existing pruning approaches rely heavily on training data to trade off model size, efficiency, and accuracy, which becomes ineffective for federated learning (FL) over distributed and confidential datasets. Moreover, the memory- and compute-intensive pruning process of most existing approaches cannot be handled by most FL devices with resource limitations. In this paper, we develop FedTiny, a novel distributed pruning framework for FL, to obtain specialized tiny models for memory- and computing-constrained participating devices with confidential local data. To alleviate biased pruning due to unseen heterogeneous data over devices, FedTiny introduces an adaptive batch normalization (BN) selection module to adaptively obtain an initially pruned model to fit deployment scenarios. Besides, to further improve the initial pruning, FedTiny develops a lightweight progressive pruning module for local finer pruning under tight memory and computational budgets, where the pruning policy for each layer is gradually determined rather than evaluating the overall deep model structure. Extensive experimental results demonstrate the effectiveness of FedTiny, which outperforms state-of-the-art baseline approaches, especially when compressing deep models to extremely sparse tiny models.
translated by 谷歌翻译
鉴于机器学习环境快速变化和昂贵的数据标记,当来自源域的标记数据与目标域的部分标记的数据在统计上不同时,必须进行半监督域的适应(SSDA)。大多数先前的SSDA研究都在集中进行,需要访问源和目标数据。但是,如今许多字段中的数据是由分布式终端设备生成的。由于隐私问题,数据可能是本地存储的,无法共享,从而导致现有SSDA研究的无效性。本文提出了一种创新的方法,以通过联合半监督域适应(FSSDA)命名的多个分布式和机密数据集实现SSDA。 FSSDA基于战略设计的知识蒸馏技术将SSDA与联合学习集成在一起,通过并行执行源和目标培训来提高效率。此外,FSSDA通过正确选择关键参数(即模仿参数)来控制跨域传输的知识量。此外,建议的FSSDA可以有效地推广到多源域适应方案。进行了广泛的实验,以证明FSSDA设计的有效性和效率。
translated by 谷歌翻译
我们在这项研究中的目标是研究一个更现实的环境,在这种环境中,我们可以为细粒度的产品类别进行弱监督的多模式实例级产品检索。我们首先贡献了product1m数据集,并定义了两个实际实例级检索任务,以实现价格比较和个性化建议的评估。对于两个实例级任务,如何准确地指出视觉语言数据中提到的产品目标并有效地降低了无关紧要的内容的影响非常具有挑战性。为了解决这个问题,我们利用训练一个更有效的跨模式与模型,该模型能够自适应地能够通过使用一个实体图,其节点和边缘分别表示实体和相似性,从而可以从多模式数据中合并来自多模式数据的关键概念信息。实体。具体而言,为实例级别的商品检索提出了一种新型的实体图增强的跨模式预处理(EGE-CMP)模型,该模型明确地将基于节点的基于节点的基于节点和子图的方式显式地注入实体知识。自我监管的混合流变压器可以减少不同对象内容之间的混淆,从而有效地指导网络专注于具有真实语义的实体。实验结果很好地验证了我们的EGE-CMP的功效和概括性,表现优于几个SOTA跨模式基线,例如夹子,Uniter和Capture。
translated by 谷歌翻译
神经网络修剪一直是减少对资源受限设备的深度神经网络的计算和记忆要求的重要技术。大多数现有的研究主要侧重于平衡修剪神经网络的稀疏性和准确性,通过策略性地删除无关紧要的参数并重新修剪修剪模型。由于记忆的增加而造成了严重的隐私风险,因此尚未调查这种训练样品的这种努力。在本文中,我们对神经网络修剪中的隐私风险进行了首次分析。具体而言,我们研究了神经网络修剪对培训数据隐私的影响,即成员推理攻击。我们首先探讨了神经网络修剪对预测差异的影响,在该预测差异中,修剪过程不成比例地影响了修剪的模型对成员和非会员的行为。同时,差异的影响甚至以细粒度的方式在不同类别之间有所不同。通过这种分歧,我们提出了对修剪的神经网络的自我发起会员推断攻击。进行了广泛的实验,以严格评估不同修剪方法,稀疏水平和对手知识的隐私影响。拟议的攻击表明,与现有的八次成员推理攻击相比,对修剪模型的攻击性能更高。此外,我们提出了一种新的防御机制,通过基于KL-Divergence距离来缓解预测差异,以保护修剪过程,该距离的预测差异已通过实验证明,可以有效地降低隐私风险,同时维持较修剪模型的稀疏性和准确性。
translated by 谷歌翻译